Search CORE

56 research outputs found

Reply With: Proactive Recommendation of Email Attachments

Author: Cancedda Nicola
Grudzien Piotr
Kukla Grzegorz
Mitra Bhaskar
Rosemarin Roy
Van Gysel Christophe
Venanzi Matteo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Email responses often contain items-such as a file or a hyperlink to an external document-that are attached to or included inline in the body of the message. Analysis of an enterprise email corpus reveals that 35% of the time when users include these items as part of their response, the attachable item is already present in their inbox or sent folder. A modern email client can proactively retrieve relevant attachable items from the user's past emails based on the context of the current conversation, and recommend them for inclusion, to reduce the time and effort involved in composing the response. In this paper, we propose a weakly supervised learning framework for recommending attachable items to the user. As email search systems are commonly available, we constrain the recommendation task to formulating effective search queries from the context of the conversations. The query is submitted to an existing IR system to retrieve relevant items for attachment. We also present a novel strategy for generating labels from an email corpus---without the need for manual annotations---that can be used to train and evaluate the query formulation model. In addition, we describe a deep convolutional neural network that demonstrates satisfactory performance on this query formulation task when evaluated on the publicly available Avocado dataset and a proprietary dataset of internal emails obtained through an employee participation program.Comment: CIKM2017. Proceedings of the 26th ACM International Conference on Information and Knowledge Management. 201

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

Prediction of Learning Curves in Machine Translation

Author: Marc Dymetman
Nicola Cancedda
Prasanth Kolachina
Sriram Venkatapathy
Publication venue
Publication date: 24/04/2020
Field of study

Abstract Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. Since ad-hoc manual translation can represent a significant investment in time and money, a prior assesment of the amount of training data required to achieve a satisfactory accuracy level can be very useful. In this work, we show how to predict what the learning curve would look like if we were to manually translate increasing amounts of data. We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available. We propose methods for predicting learning curves in both these scenarios

CiteSeerX

Learning Structural Kernels for Natural Language Processing

Author: Bergstra James
Cancedda Nicola
Chang Chih-Chung
Chu Wei
Driessens Kurt
Gönen Mehmet
Igel Christian
Lodhi Huma
Pang Bo
Pedregosa Fabian
Specia Lucia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection by maximizing the evidence on the training data through gradient-based methods. In this paper we show how to perform this in the context of structural kernels by using Gaussian Processes. Experimental results on tree kernels show that this procedure results in better prediction performance compared to hyperparameter optimization via grid search. The framework proposed in this paper can be adapted to other structures besides trees, e.g., strings and graphs, thereby extending the utility of kernel-based methods

arXiv.org e-Print Archive

Crossref

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Spiral - Imperial College Digital Repository

White Rose Research Online

University of Melbourne Institutional Repository

Private access to phrase tables for statistical machine translation

Author: Nicola Cancedda
Publication venue
Publication date: 01/01/2012
Field of study

Abstract Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated. We propose a simple and practical encryption-based method addressing this barrier

CiteSeerX

Experiments with Corpus-based LFG Specialization

Author: Christer Samuelsson
Nicola Cancedda
Publication venue
Publication date: 01/01/2000
Field of study

Sophisticated grammar formalisms, such as LFG, allow concisely capturing complex linguistic phenomena. The powerful operators provided by such formalisms can however introduce spurious ambiguity, making parsing inefficient. A simple form of corpus-based grammar pruning is evaluated experimentally on two wide-coverage grammars, one English and one French. Speedups of up to a factor 6 were obtained, at a cost in grammatical coverage of about 13%. A two-stage architecture allows achieving significant speedups without introducing additional parse failures. 1 Introduction Expressive grammar formalisms allow grammar developers to capture complex linguistic generalizations concisely and elegantly, thus greatly facilitating grammar development and maintenance. (Carrol, 1994) found that the empirical performance when parsing with unification-based grammars is nowhere near the theoretical worst-case complexity. Nonetheless, directly parsing with such grammars, in the form they were developed, ..

CiteSeerX

Crossref

Corpus-based Grammar Specialization

Author: Christer Samuelsson
Nicola Cancedda
Publication venue
Publication date: 01/01/2000
Field of study

Broad-coverage grammars tend to be highly ambiguous. When such grammars are used in a restricted domain, it may be desirable to specialize them, in effect trading some coverage for a reduction in ambiguity. Grammar specialization is here given a novel formulation as an optimization problem, in which the search is guided by a global measure combining coverage, ambiguity and grammar size. The method, applicable to any unification grammar with a phrasestructure backbone, is shown to be effective in specializing a broad-coverage LFG for French. 1 Introduction Expressive grammar formalisms allow grammar developers to capture complex linguistic generalizations concisely and elegantly, thus greatly facilitating grammar development and maintenance. Broad-coverage grammars, however, tend to overgenerate considerably, thus allowing large amounts of spurious ambiguity. If the benefits resulting from more concise grammatical descriptions are to outweigh the costs of spurious ambiguity, the latte..

CiteSeerX

Crossref

Assessing Quick Update Methods of Statistical Translation Models

Author: Cancedda Nicola
Mirkin Shachar
Publication venue: HAL CCSD
Publication date: 01/01/2013
Field of study

International audienceno abstrac

Hal - Université Grenoble Alpes